Session 3: Statistical modeling and machine learning
Department of Econometrics and Business Statistics
Functions such as lm, glm, lmer, glmmTMB … allow us to fit models
e.g. fit french fry rating with respect to all designed main effects:
summary, str, resid, fitted, coef, … allow us to extract different parts of a model for a linear model. Other model functions work differently … major headaches!
Call:
lm(formula = rating ~ type + treatment + time + subject, data = ff_long)
Residuals:
Min 1Q Median 3Q Max
-7.073 -1.967 -0.464 1.520 10.219
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.1985 0.2579 4.65 3.5e-06 ***
typegrassy -1.1551 0.1557 -7.42 1.5e-13 ***
typepainty 0.7005 0.1558 4.50 7.1e-06 ***
typepotato 5.1332 0.1557 32.96 < 2e-16 ***
typerancid 2.0329 0.1557 13.06 < 2e-16 ***
treatment2 -0.0513 0.1205 -0.43 0.670
treatment3 -0.0455 0.1206 -0.38 0.706
time2 0.1019 0.2161 0.47 0.637
time3 -0.0136 0.2161 -0.06 0.950
time4 -0.1025 0.2161 -0.47 0.635
time5 -0.1330 0.2169 -0.61 0.540
time6 -0.0461 0.2161 -0.21 0.831
time7 -0.2462 0.2163 -1.14 0.255
time8 -0.1163 0.2166 -0.54 0.591
time9 0.1319 0.2278 0.58 0.563
time10 0.5463 0.2278 2.40 0.017 *
subject10 1.7145 0.2439 7.03 2.5e-12 ***
subject15 -0.3591 0.2449 -1.47 0.143
subject16 0.4752 0.2441 1.95 0.052 .
subject19 2.0165 0.2439 8.27 < 2e-16 ***
subject31 1.4928 0.2510 5.95 3.0e-09 ***
subject51 1.8728 0.2439 7.68 2.1e-14 ***
subject52 0.1948 0.2439 0.80 0.424
subject63 0.9605 0.2439 3.94 8.4e-05 ***
subject78 -0.5828 0.2439 -2.39 0.017 *
subject79 -0.5370 0.2503 -2.15 0.032 *
subject86 0.4354 0.2510 1.73 0.083 .
---
Signif. codes:
0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.9 on 3444 degrees of freedom
(9 observations deleted due to missingness)
Multiple R-squared: 0.4, Adjusted R-squared: 0.395
F-statistic: 88.1 on 26 and 3444 DF, p-value: <2e-16
The package broom (broom.mixed for glmmTMB) gets model results into a tidy format at different levels
broom::glance (values such as AIC, BIC, model fit, …)broom::tidy (estimate, confidence interval, significance level, …)broom::augment (fitted values, residuals, predictions, influence, …)glancetidy# A tibble: 6 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 1.20 0.258 4.65 3.50e- 6
2 typegrassy -1.16 0.156 -7.42 1.49e- 13
3 typepainty 0.701 0.156 4.50 7.11e- 6
4 typepotato 5.13 0.156 33.0 2.28e-207
5 typerancid 2.03 0.156 13.1 4.69e- 38
6 treatment2 -0.0513 0.121 -0.426 6.70e- 1
augmentRows: 3,471
Columns: 12
$ .rownames <chr> "1", "2", "3", "4", "5", "6", "7", "8",…
$ rating <dbl> 2.9, 0.0, 0.0, 0.0, 5.5, 14.0, 0.0, 0.0…
$ type <chr> "potato", "buttery", "grassy", "rancid"…
$ treatment <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ time <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ subject <fct> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 10, 10, 1…
$ .fitted <dbl> 6.332, 1.199, 0.043, 3.231, 1.899, 6.33…
$ .resid <dbl> -3.432, -1.199, -0.043, -3.231, 3.601, …
$ .hat <dbl> 0.0079, 0.0079, 0.0079, 0.0079, 0.0079,…
$ .sigma <dbl> 2.9, 2.9, 2.9, 2.9, 2.9, 2.9, 2.9, 2.9,…
$ .cooksd <dbl> 4.2e-04, 5.1e-05, 6.7e-08, 3.7e-04, 4.6…
$ .std.resid <dbl> -1.188, -0.415, -0.015, -1.119, 1.247, …
Add random intercepts for each subject
The broom.mixed package provides similar functionality to mixed effects models as broom does for fixed effects models
broom.mixed on each level: model, parameters, and dataff_long data with the model diagnostics05:00 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.